Lexical Sememe Prediction via Word Embeddings and Matrix Factorization
نویسندگان
چکیده
Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.
منابع مشابه
Online Learning of Interpretable Word Embeddings
Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Nonnegative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to mainta...
متن کاملWord Embeddings via Tensor Factorization
Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order word co-occurrence based arrays, or tensors. We present four word embeddings using tensor factorization and analyze their advantages and disadvantages. One of...
متن کاملWeak Supervision for Semi-supervised Topic Modeling via Word Embeddings
Semi-supervised algorithms have been shown to improve the results of topic modeling when applied to unstructured text corpora. However, sufficient supervision is not always available. This paper proposes a new process, Weak+, suitable for use in semi-supervised topic modeling via matrix factorization, when limited supervision is available. This process uses word embeddings to provide additional...
متن کاملLearning Cross-lingual Word Embeddings via Matrix Co-factorization
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...
متن کاملTailoring Word Embeddings for Bilexical Predictions: An Experimental Comparison
We investigate the problem of inducing word embeddings that are tailored for a particular bilexical relation. Our learning algorithm takes an existing lexical vector space and compresses it such that the resulting word embeddings are good predictors for a target bilexical relation. In experiments we show that task-specific embeddings can benefit both the quality and efficiency in lexical predic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017